Clustering in Relational Data and Ontologies
نویسندگان
چکیده
This dissertation studies the problem of clustering objects represented by relational data. This is a pertinent problem as many real-world data sets can only be represented by relational data for which object-based clustering algorithms are not designed. Relational data are encountered in many fields including biology, management, industrial engineering, and social sciences. Unlike numerical object data, which are represented by a set of feature values (e.g. height, weight, shoe size) of an object, relational object data are the numerical values of (dis)similarity between objects. For this reason, conventional cluster analysis methods such as k-means and fuzzy c-means cannot be used directly with relational data. I focus on three main problems of cluster analysis of relational data: (i) tendency prior to clustering—how many clusters are there?; (ii) partitioning of objects—which objects belong to which cluster?; and (iii) validity of the resultant clusters—are the partitions “good”? Analyses are included in this dissertation that prove that the Visual Assessment of cluster Tendency (VAT) algorithm has a direct relation to single-linkage hierarchical clustering and Dunn’s cluster validity index. These analyses are important to the development of two novel clustering algorithms, CLODD-CLustering in Ordered Dissimilarity Data and ReSL-Rectangular SingleLinkage clustering. Also presented in my analysis of VAT is a recursive formulation of the improved VAT (iVAT) algorithm. iVAT is shown to improve the visual evidence of cluster tendency on some types of data for which VAT fails. The computational complexity of my recursive formulation of iVAT is O(n), as opposed to O(n) of the original formulation—n is the number of objects considered. CLODD is a clustering algorithm that works on reordered dissimilarity data. Typically, the dissimilarity matrix is reordered with the VAT algorithm; although, I
منابع مشابه
A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection
Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملIntegrating Ontological Prior Knowledge into Relational Learning
Ontologies represent an important source of prior information which lends itself to the integration into statistical modeling. This paper discusses approaches towards employing ontological knowledge for relational learning. Our analysis is based on the IHRM model that performs relational learning by including latent variables that can be interpreted as cluster variables of the entities in the d...
متن کاملبررسی هستان شناسی های توسعه یافته مبتنی بر اصول هستان شناسی های منبع باز زیست پزشکی
Background and Aim: Ontologies facilitate data integration, exchange, searching and querying. Open Biomedical Ontologies (OBO) Foundry is a solution for creating reference ontologies. In this foundry, the design of ontologies is based on established principles which allow for their interactions as a single system. The purpose of this study is to determine the main features of ontologies develop...
متن کاملStoring OWL Ontologies in SQL3 Object-Relational Databases
When a large amount of data is stored in OWL files, it is not efficient to maintain and query those data. The OWL syntax is based on XML, which is a meta-markup language. Thus, it is suitable for data description and data exchange, rather than for data storage and data management. Furthermore, enabling multiple users to work with the same ontology in parallel and make modifications mandates the...
متن کامل